Loop Tiling in Large-Scale Stencil Codes at Run-Time with OPS

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Supplementary Material: Loop Tiling in Large-Scale Stencil Codes at Run-time with OPS

if(dir == g_xdir) { if(sweep_number == 1) { ops_par_loop(advec_cell_kernel1_xdir, "advec_cell_kernel1_xdir", clover_grid, 2, rangexy, ops_arg_dat(work_array1, 1, S2D_00, "double", OPS_WRITE), ops_arg_dat(work_array2, 1, S2D_00, "double", OPS_WRITE), ops_arg_dat(volume, 1, S2D_00, "double", OPS_READ), ops_arg_dat(vol_flux_x, 1, S2D_00_P10, "double", OPS_READ), ops_arg_dat(vol_flux_y, 1, S2D_00_0...

متن کامل

Writing productive stencil codes with overlapped tiling ‡ 3

Stencil computations constitute the kernel of many scientific applications. Tiling is often used to improve 11 the performance of stencil codes for data locality and parallelism. However, tiled stencil codes typically require shadow regions, whose management becomes a burden to programmers. In fact, it is often the 13 case that the code required to manage these regions, and in particular their ...

متن کامل

Real-Time Large-Scale Dense 3D Reconstruction with Loop Closure

In the highly active research field of dense 3D reconstruction and modelling, loop closure is still a largely unsolved problem. While a number of previous works show how to accumulate keyframes, globally optimize their pose on closure, and compute a dense 3D model as a post-processing step, in this paper we propose an online framework which delivers a consistent 3D model to the user in real tim...

متن کامل

Run-time thread management for large-scale distributed-memory multiprocessors

E ective thread management is crucial to achieving good performance on large-scale distributed-memory multiprocessors that support dynamic threads. For a given parallel computation with some associated task graph, a thread-management algorithm produces a running schedule as output, subject to the precedence constraints imposed by the task graph and the constraints imposed by the interprocessor ...

متن کامل

Code Refinement of Stencil Codes

A straightforward implementation of an algorithm in a general-purpose programming language does usually not deliver peak performance: Compilers often fail to automatically tune the code for certain hardware peculiarities like memory hierarchy or vector execution units. Manually tuning the code is firstly error-prone as well as time-consuming and secondly taints the code by exposing those peculi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Parallel and Distributed Systems

سال: 2018

ISSN: 1045-9219

DOI: 10.1109/tpds.2017.2778161